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Abstract: Mitochondrial genomes have been extensively studied for phylogenetic purposes and to investigate intra- and 
interspecific genetic variations. In recent years, numerous groups have undertaken sequencing of platyhelminth mitochon- 
drial genomes. Haplorchis taichui (family Heterophyidae) is a trematode that infects humans and animals mainly in Asia, 
including the Mekong River basin. We sequenced and determined the organization of the complete mitochondrial genome 
of/-/, taichui. The mitochondrial genome is 15,130 bp long, containing 12 protein-coding genes, 2 ribosomal RNAs (rRNAs, 
a small and a large subunit), and 22 transfer RNAs (tRNAs). Like other trematodes, it does not encode the atp8 gene. All 
genes are transcribed from the same strand. The ATG initiation codon is used for 9 protein-coding genes, and GTG for 
the remaining 3 (nad1, nad4, and nad5). The mitochondrial genome of H. taichui has a single long non-coding region be- 
tween trnE and trnG. H. taichui has evolved as being more closely related to Opisthorchiidae than other trematode groups 
with maximal support in the phylogenetic analysis. Our results could provide a resource for the comparative mitochondrial 
genome analysis of trematodes, and may yield genetic markers for molecular epidemiological investigations into intestinal 
flukes. 

Key words: Haplorchis taichui, trematode, mitochondrial genome, molecular phytogeny 



INTRODUCTION 

The intestinal trematode, Haplorchis taichui, is a medically 
important parasite infecting humans and livestock. Haplorchi- 
asis is a major public health threat in Asia and in parts of Afri- 
ca and the Americas [1-3]. H. taichui is the most frequently re- 
ported species among the minute intestinal flukes from South- 
east Asia, including Thailand, Lao PDR, China, and Vietnam 
[3,4-7]. Mitochondrial (mt) genomes exhibit a relatively con- 
served suite of protein-coding sequences, but also relatively 
rapid rates of evolutionary change [8,9], In recent years, com- 
plete mitochondrial DNA (mtDNA) sequences have been ex- 
tensively used to infer higher level phylogenies [10,11] and 
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also for taxonomy and population genetics at lower taxonom- 
ic levels [12-14]. To date, quite a number of complete mt ge- 
nomes of metazoan species, including helminths, have been 
deposited in GenBank and published [15]. Information from 
flatworm mitochondrial genomes is strongly biased toward 
parasitic species of medical importance. For this reason, recent 
mitochondrial genome scale phylogenetic surveys have em- 
phasized the need to collect data for the major groups of flat- 
worms that have not been sampled [16,17]. However, most of 
them still remain poorly understood at the molecular level, in 
particular, the complete mt genomes of the species in the fam- 
ily Heterophyidae. Parasitic flatworm mt genomes, ranging in 
size usually from 13 to 14 kb but far bigger up to 24 kb some- 
times, are typically circular and usually encode 36 genes, in- 
cluding 12 protein-coding genes, and without introns and with 
short intergenic regions [18]. The Digenea cunently contains 
about 18,000 nominal species parasitizing vertebrates, and 
sometimes humans as the definitive host [19]. The purpose of 
the present study was to sequence the mt genome of H. taichui 
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for comparison with the organization and sequence of the mt 
genomes of other trematodes. In addition, we wished to recon- 
struct the phylogenetic relationships of the family Heterophy- 
idae within the class Trematoda using mtDNA sequences. 

MATERIALS AND METHODS 

Long PCR amplification and sequencing of the/-/, taichui 
mtDNA molecule 

Adult H. taichui worms were obtained from naturally infect- 
ed Laotian people during the activity of "Korea-Lao PDR Col- 
laborative Project for Control of Foodborne-Trematode Infec- 
tions (esp. Opisthorchiasis) in Lao PDR (2007-2011)". The spec- 
imens were washed in normal physiological saline and identi- 
fied based on morphological characters (gonotyl bears 12-16 
spines in H. taichui). The worms were stored in 70% ethanol 
prior to DNA extraction. Total genomic DNA was extracted from 
200 specimens using a QIAamp tissue kit (Qiagen Inc., Valen- 
cia, California, USA), according to the manufacturer's instruc- 
tion and used as the template DNA. The complete mt genome 
was PCR-amplified in overlapping fragments. The nucleotide 
sequences and relative positions of the PCR primers are shown 
in Fig. 1 . PCR reactions were performed in a 50 pi reaction vol- 
ume consisting of 10 units of EF-Taq polymerase (Solgent Co., 
Daejeon, Korea), 2.5 mM dNTP mixture, 2.5 mM MgCl 2 , 20 
pmole of each primer, and 10 pg of genomic DNA in a ther- 
mocycler (Biometra Co., Goettingen, Germany) under the fol- 
lowing conditions: 92°C for 2 min (initial denaturation), then 
92°C for 10 sec (denaturation), 50°C for 30 sec (annealing), and 
68°C for 2 min to 10 min (extension) for 10 cycles, followed by 
92°C for 2 min, then 92°C for 10 sec, 48°C for 30 sec, and 68°C 
for 2 min to 10 min for 20 cycles, and a final extension at 68°C 
for 8 min. 

A negative control (no template) was also included for every 
PCR reaction to detect contamination. Each amplicon (3 pi) 
was examined by agarose (1%) gel electrophoresis, stained with 
ethidium bromide and photographed using a gel documenta- 
tion system (LMtec, Cambridge, UK), excised under long-wave- 
length UV light, extracted using a Doc-do purification kit (Elpis 
Co., Daejeon, Korea), and then used as a template for sequenc- 
ing reactions. The primer walking method was employed to 
obtain overlapping sequences for each of the amplified frag- 
ments. Cyclic sequencing from both ends of the fragments was 
performed with a Big-Dye, and the amplified products were 
subjected to electrophoresis on an ABI 3100 automated DNA 



tRNA-T 




tRNA-H tRNA-G 



Primers 


Primer sequence (5'-3') 


Direction 


Lacation 


PLND 2-294-F 


TDMGTTTGGWKTDTTYCCDTTT 


Forward 


nad2 


PLND 2-834-R 


ARAWAAARYTGYTCWRAAAANCTATANA 


Reverse 


nad2 


PLND1-F 


KCGTAAGGGBCCWAAHAAGGTTGG 


Forward 


nadl 


PLND1-R 


AATCATAACGAAYACGHGGA 


Reverse 


nad1 


PL16S-F 


WYYGTGCDAAGGTAGCATAAT 


Forward 


rml 


PL16S-R2 


AWAGATAAGAACCRACCTGGCT 


Reverse 


imL 


PL12S-F 


CAGTGCCAGCAKCYGCGGTTADWCTG 


Forward 


rrnS 


PL12S-R 


AYCSWGPKTGWCGGGCGRTRTGTAC 


Reverse 


rms 


PLND 5-578-F 


ATGCGKGCYCCNACNCCNGTWAGTTC 


Forward 


nad5 


PLND 5-1065-R 


ARARMATGGTTMSTAAAAWABAM 


Reverse 


nad5 



Fig. 1 . A map of the complete mitochondrial genome of Haplor- 
chis taichui. The primer sequences used for amplification of re- 
spective mitochondrial genes are indicated in the map. 

sequencer. 

Sequence analysis and characterization of the Haplorchis 
taichui mt genome 

Gene annotation, genome organization, translation initia- 
tion, translation termination codons, and the boundaries be- 
tween protein-coding genes of mt genomes were identified 
based on comparison with mt genomes of other trematodes 
reported previously [17,18]. Sequences were assembled manu- 
ally and aligned against the complete mt genome sequences of 
our own Metagonimus yokogawai sequence (will be published 
elsewhere) and trematode parasites available in the GenBank 
database (http://www.ncbi.nlm.nih.gov WebGenBank) using 
BLAST searches. Open reading frames and gene boundaries 
were confirmed by comparing with M. yokogawai nucleotide 
sequences. The codon usage profiles of 12 protein-coding genes 
and their nucleotide composition were calculated using Ge- 
neious 6.1.5 (Biomatters Co., Auckland, New Zealand) pro- 
gram [20]. Putative secondary structures of 22 tRNA genes 
were identified manually by recognizing potential secondary 
structures and anticodon sequences. 
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Phylogenetic analysis 

To assess the phylogenetic position of H. taichui and the 
utility of mt genomes in resolving the interrelationships of 
nematode orders, complete mitochondrial genome sequences 
of 15 flatworms were analyzed. The mtDNA sequences were as 
follows: H. taichui (KF214770), Trichobilharzia regenti (NC_ 
009680), Schistosoma spindale (DQ157223), S. haematobium 
(DQ157222), S. mansoni (NC_002545), S. japonicum (NC_ 
002544), S. mekongi (NC_002529), Fasciola hepatica (NC_ 
002546), Paragonimus westermani (NC_002354), Opistorchis 
felineus (EU921260), O. viverrini (JF739555), Clonorchis sinensis 
(FJ381664/ JF729303/ JF729304), and 1 Monogenea species 
{Gyrodactylus thymalli, NC_009682) as the outgroup. Each 
gene was translated into an amino acid sequence using the 
nematode mt genetic code by ttanslation table 21 in Geneious 
6.1.5, and was aligned based on its amino acid sequence using 
default settings. A conserved block of concatenated alignment 
was selected using the Gblocks program [21], (http://molevol. 
cmima.csic.es/castresana/Gblocks_server) for 12 protein-cod- 
ing genes examined. Phylogenetic relationships among nema- 
todes were infened using different nee construction algorithms, 
i.e., maximum parsimony (MP), neighbor-joining (NJ), maxi- 
mum likelihood (ML), Bayesian phylogeny (BP) [22], PAUP* 
[23], PhyML 3.0 [24], and MrBayes [25]. Bootsnap analysis was 
done using 1,000 random replications. Models of amino acid 
substitution were determined for each data partition indepen- 
dently using ModelTest supported in Geneious 6.1.5. The MP 
analysis was performed using the exhaustive search option. ML 
analysis was performed using the substitution model Le Gas- 
cuel (LG). In BP analysis, the following settings were applied: 
the number of cycles= 1,100,000; sampling frequency =200; 
heated chains =4; bum-in = 100,000. 

RESULTS 

General features of the mt genome of H. taichui 

The complete mt genome of H. taichui Lao PDR isolate (Gen- 
Bank accession no. KF214770) is 15,130 bp in length (Fig. 1). It 
encodes 36 genes; 12 protein-coding genes (coxl-3, nadl-6, 
nad4L, atp6, cob, and lacking atp8), 22 transfer RNA genes, and 
2 ribosomal RNA genes.The relative positions and lengths of 
each gene are given in Table 1. An AT-rich region (1,710 bp) is 
located between tRNA-Glu and tRNA-Gly. All genes are ttan- 
scribed in the same direction. The 2 adjacent genes, nad4L and 
nad4, overlap each other by 40 nt in different reading frames. 



Table 1 . Position and characteristics of protein-coding and non- 
coding sequences in the mt genome of Haplorchis taichui 





No. of 


Codons 


r UblLIUI lb 


Gene/Region 


Nucleo- 
tides 


Amino 
acids 


Initiation 


Termina- 
tion 


(5'-3') 


cox3 


657 


218 


ATG 


TAG 


1-657 


tRNA-H 


66 








660-725 


cob 


1,110 


369 


ATG 


TAG 


732-1841 


nad4L 


264 


87 


ATG 


TAG 


1843-2106 


nad4 


1,281 


426 


GTG 


TAA 


2067-3347 


tRNA-Q 


66 








3357-3422 


tRNA-F 


62 








3427-3488 


tRNA-M 


64 








3489-3552 


atp6 


516 


171 


ATG 


TAG 


3553-4068 


nad2 


870 


289 


ATG 


TAA 


4094-4963 


tRNA-V 


60 








4968-5027 


tRNA-A 


63 








5029-5091 


tRNA-D 


67 








5094-5160 


nad1 


906 


301 


GTG 


TAG 


5161-6066 


tRNA-N 


66 








6069-6134 


tRNA-P 


64 








6139-6202 


tRNA-l 


65 








6199-6263 


tRNA-K 


65 








6267-6331 


nad3 


360 


119 


ATG 


TAG 


6332-6691 


tRNA-S 


62 








6705-6766 


tRNA-W 


64 








6774-6837 


coxl 


1,542 


513 


ATG 


TAG 


6842-8383 


tRNA-T 


61 








8385-8445 


rmL 


979 








8446-9424 


tRNA-C 


64 








9426-9489 


rmS 


747 








9490-10236 


cox2 


624 


207 


ATG 


TAG 


10237-10860 


nad6 


459 


152 


ATG 


TAG 


10847-11305 


tRNA-Y 


67 








11306-11372 


tRNA-Ll 


66 








11370-11435 


tRNA-S2 


64 








11434-11497 


tRNA-L2 


70 








11504-11573 


tRNA-R 


64 








11585-11648 


nad5 


1,587 


528 


GTG 


TAA 


11650-13236 


tRNA-E 


73 








13248-13320 


NR 


1,710 








13323-15032 


tRNA-G 


59 








15035-15093 



Codon usage and protein-coding genes 

The mt genome of H. taichui encodes 12 protein-coding 
genes, identical with the situation in other nematodes. The 
start and termination codons of these were identified by se- 
quence comparison with homologs in other nematodes. The 
ATG codon was used in 9 protein-coding genes, and the GTG 
codon in 3 genes (nadl, nad4, and nad5). The TAG stop codon 
was used in 9 genes [cox3, cob, nad4L, atp6, nadl, nadl, nad3, 
coxl , and nad6) and the TAA termination codon in the remain- 
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Table 2. Properties of trematode mtDNA protein-coding genes and similarity comparison between Haplorchis taichui and other trematodes 



Gene 












Species 












H.t 


S.s 


S.h 


S.m 


S.j 


S.ml 


S.mk 


F.h 


P.w 


O.f 


O.v 


C.s 


No. of amino acids 
























cox3 


218 


221 


221 


218 


215 


217 


217 


214 


215 


214 


214 


213 


cob 


369 


364 


367 


365 


372 


373 


373 


371 


373 


371 


369 


370 


nad4L 


87 


84 


86 


87 


88 


88 


88 


91 


86 


87 


87 


87 


nad4 


426 


420 


421 


420 


425 


424 


424 


424 


402 


425 


425 


425 


atp6 


171 


173 


174 


174 


173 


1 74 


174 


173 


171 


171 


171 


171 


nad2 


289 


279 


279 


280 


285 


284 


284 


289 


289 


289 


289 


290 


nadl 


301 


291 


298 


297 


297 


298 


296 


301 


297 


300 


300 


300 


nad3 


119 


122 


122 


121 


120 


121 


121 


119 


119 


118 


118 


118 


coxl 


513 


587 


601 


511 


509 


>433 


51 1 


511 


498 


520 


516 


519 


cox2 


207 


200 


198 


198 


203 


y 


233 


201 


200 


212 


214 


211 


nad6 


152 


155 


157 


150 


153 


u 


154 


151 


151 


153 


153 


153 


nad5 


528 


528 


527 


528 


529 


>333 


531 


523 


528 


534 


534 


534 


Amino acid similarity (%) 
























cox3 


100 


28.7 


28.2 


27.3 


24.2 


30.0 


28.2 


49.5 


54.0 


49.5 


50.0 


49.3 


cob 


100 


48.5 


51.1 


52.9 


54.9 


53.0 


53.8 


75.6 


75.9 


82.4 


77.8 


80.5 


nad4L 


100 


32.9 


322.6 


32.6 


32.2 


31 .0 


33.3 


64.4 


67.1 


72.4 


73.6 


71.3 


nad4 


100 


34.0 


30.9 


32.8 


29.4 


30.4 


31 .3 


50.7 


52.7 


59.6 


59.1 


58.9 


atp6 


100 


38.5 


34.5 


33.9 


39.9 


41 .4 


38.7 


62.8 


58.5 


64.9 


66.1 


67.8 


nad2 


100 


32.3 


30.6 


33.3 


32.3 


33.3 


34.4 


47.1 


46.4 


51.9 


53.3 


52.2 


nadl 


100 


41.1 


40.2 


43.1 


46.8 


42.1 


43.1 


69.9 


69.6 


72.6 


70.9 


70.2 


nad3 


100 


32.8 


37.0 


37.3 


37.0 


41 .7 


37.0 


63.0 


60.5 


67.2 


62.2 


63.9 


coxl 


100 


68.0 


68.0 


68.2 


67.9 


nc 


66.5 


74.7 


78.5 


76.8 


78.9 


78.0 


cox2 


100 


44.6 


42.6 


45.8 


45.4 


nc 


41 .6 


48.8 


48.5 


59.5 


58.5 


61.0 


nad6 


100 


34.0 


29. 


34.2 


35.7 


nc 


36.4 


51.0 


54.0 


57.2 


60.5 


57.2 


nad5 


100 


34.0 


32.8 


31.2 


32.5 


nc 


32.4 


52.0 


50.3 


46.5 


44.6 


46.0 


Inferred initiation/termination codon a 






















cox3 


A/G 


A/A 


A/G 


G/G 


A/G 


A/A 


A/G 


A/G 


A/G 


A/G 


A/G 


A/G 


cob 


A/G 


A/G 


A/A 


G/G 


A/G 


A/A 


A/A 


A/G 


A/G 


A/G 


A/G 


A/G 


nad4L 


A/G 


A/A 


A/A 


A/A 


A/A 


A/A 


A/A 


G/G 


G/G 


A/G 


A/G 


A/G 


nad4 


G/A 


A/A 


A/G 


A/A 


A/G 


A/A 


A/A 


G/A 


A/G 


A/G 


A/G 


G/G 


atp6 


A/G 


A/A 


A/G 


A/G 


A/A 


A/G 


A/A 


A/G 


A/G 


A/G 


A/G 


A/G 


nad2 


A/A 


A/A 


A/A 


G/A 


A/G 


A/G 


A/A 


A/G 


A/A 


A/G 


A/G 


G/G 


nadl 


G/G 


A/A 


A/G 


G/G 


A/G 


A/A 


A/A 


G/G 


A/G 


G/G 


G/G 


G/G 


nad3 


A/G 


A/A 


A/A 


A/G 


A/G 


A/G 


A/G 


A/G 


A/G 


G/G 


G/G 


G/G 


coxl 


A/G 


A/A 


A/G 


A/G 


G/G 


A/nc 


A/A 


A/G 


A/G 


G/G 


G/G 


G/A 


cox2 


A/G 


A/A 


A/G 


A/A 


A/A 


nc/nc 


A/A 


A/G 


A/G 


A/G 


A/G 


A/G 


nad6 


A/G 


A/G 


A/A 


A/A 


A/G 


nc/nc 


G/A 


A/G 


A/G 


A/G 


A/G 


G/A 


nad5 


G/A 


A/G 


A/G 


A/G 


A/G 


nc/G 


G/A 


G/G 


G/G 


A/G 


A/A 


G/A 



Inferred initiation codons have not been determined for some genes and species (u, undetermined), and other genes for S. malayensis have yet to be 
characterized (nc, not characterized), giving rise to partial lengths for coxl and nad5. H.t, Haplorchis taichui; S.s, Schistosoma spindale; S.h, S. hae- 
matobium; S.m, S. mansoni; S.j, S. japonicum; S.ml, S. malayensis; S.mk, S. mekongi; F.h, F. hepatica; P.w, Paragonimus westermani; O.f, Opistor- 
chis felineus; O.v, O. viverrini; C.s, Clonorchis sinensis. a A or G (TG)/(TA) A or G. 



ing 3 genes (nad4, coxl, and nad5). Pairwise comparisons were 
made among the amino acid sequences infeued from individ- 
ual protein-coding genes in the H. taichui genome with those 
representing 12 other trematodes (Table 2). The amino acid 
sequence similarities in individual inferred proteins ranged 
from 76.8% (coxl) to 82.4% [cob) between H. taichui and O. 



felineus; and from 78.0% (coxl) to 80.5% [cob) between H. tai- 
chui and C. sinensis. The amino acid sequence similarities be- 
tween H. taichui and S. japonicum ranged from 24.2% (cox3) to 
67.9% (coxl); and from 47.1% (nad2) to 75.6% (cob) with F. 
hepatica (Table 2). The 12 protein-coding genes were 10,176 
bp in length and composed of 43% T, 17.1% A, 28% G, and, 
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Table 3. Nucleotide content of protein-coding genes from complete or almost complete mitochondrial genomes of flatworms 



Species 


Gen Bank 




Base composition (%) 




Total bp 


Total no. of 




T 


C 


A 


G 


A+T 


uodyo 




Schistosoma spindale 


DQ1 57223 


45.2 


7.0 


28.1 


19.7 


73.3 


10,308 


3,424 


Schistosoma haematobium 


DQ1 57222 


44.9 


7.6 


27.8 


19,5 


72.7 


10,389 


3,451 


Schistosoma mansoni 


NC 002545 


45.6 


8.2 


23.3 


23.0 


68.9 


10,083 


3,349 


Schistosoma japonicum 


NC 002544 


48.3 


8.0 


23.0 


20.7 


71.3 


10,143 


3,369 


Schistosoma malayensis a 


AAG60031 


48.8 


6.6 


23.6 


20.9 


72.4 


8,271 


2,745 


Schistosoma mekongi 


NC 002529 


48.4 


6.7 


24.3 


20.6 


72.7 


10,254 


3,406 


Fasciola hepatica 


NC_002546 


49.4 


9.6 


14.2 


26.8 


63.6 


10,140 


3,368 


Paragonimus westermani 


NC 002354 


38.3 


17.9 


13.2 


30.6 


51.5 


10,023 


3,329 


Opisthorchis felineus 


EU921260 


45.3 


12.1 


15.3 


27.2 


60.6 


10,217 


3,394 


Opisthorchis viverrini 


JF739555 


44.9 


12.3 


15.5 


27.3 


60.4 


10,206 


3,390 


Clonorchis sinensis 


FJ381664 


45.1 


11.9 


15.7 


27.3 


60.8 


10,209 


3,391 


Haplorchis taichui 


KF2 14770 


43.0 


11.9 


17.1 


28.0 


60.1 


10,176 


3,380 



a S. malayensis is an incomplete mt genome. 



Table 4. Nucleotide codon usage for 1 2 protein-encoding genes of the mitochondrial genome of Haplorchis taichui 



AA 


Codon 


No. 


% 


AA 


Codon 


No. 


% 


AA 


Codon 


No. 


% 


AA 


Codon 


No. 


% 


AA 


Ab 


No. 


% 


Phe 


UUU(F) 


308 


9.19 


Ser 


UCU(S) 


96 


2.87 


Tyr 


UAUfY) 


136 


4.06 


Cys 


UGU(C) 


98 


3.93 


Ala 


A 


146 


4.3 


Phe 


UUC(F) 


48 


1.43 


Ser 


UCC(S) 
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8 


0.24 


Arg 


CGC(R) 


7 


0.21 


His 


H 


54 


1.6 


Leu 


CUA(L) 


22 


0.66 


Pro 


CCA(P) 


13 


0.39 


Gin 


CAA(Q) 


8 


0.24 


Arg 


CGA(R) 


8 


0.24 


lie 


I 


107 


3.2 


Leu 


CUG(L) 


52 


1.55 


Pro 


CCG(P) 


15 


0.45 
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CAG(Q) 


17 
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CGG(R) 
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K 
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M 
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14 
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11 
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N 
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AGG(S) 
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R 
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GUU(V) 
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GCU(A) 


70 


2.09 
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GAU(D) 


51 
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Gly 


GGU(G) 
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Ser 


S 
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GUC(V) 


32 
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Ala 
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18 
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GGC(G) 
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1.19 
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T 
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GCA(A) 


10 


0.30 
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GAA(E) 
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0.45 


Gly 


GGA(G) 


49 


1,46 
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V 


402 
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29 


0.87 
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55 


1.64 
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GGG(G) 
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4.21 


Trp 


W 
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3.5 


































Tyr 


Y 


175 


5.2 



AA, Amino acid; Ab, Abbreviation; No., Number of codons. 



11.9% C, accounting for 67.3% of the full length of the genome 
(Table 3). All 64 codons were used (Table 4). However, some 
codons, such as CGC, CGA for arginine, ACA for threonine, or 
GCA for alanine, were very uncommon, reflecting the nucleo- 
tide composition. Several amino acids, histidine (1.6%), lysine 
(1.3%), and glutamine (0.7%), were rarely used. Five of the 
most frequently used amino acids were leucine (16.7%), va- 
line (11.9%), serine (10.4%), phenylalanine (9.9%), and gly- 
cine (9.0%). These collectively constituted 57.9% of the total 



number of amino acids. Amino acids encoded by T-rich codons 
(> 2 Ts in a triplet) were the most abundant and accounted 
for 42.7% of the total amino acid composition, whereas C-rich 
codons (> 2 Cs in a triplet) were the least used (they account- 
ed for merely 5.0% of the total amino acid composition). As 
shown in Table 4, unequal usage of synonymous codons avoid- 
ing C at the third codon position was prominent in most cas- 
es; for instance, the relative frequency of using TTT for Phe was 
9.2%, but the frequency of using TTC was only 1.4%. 
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Transfer RNA genes, ribosomal RNA genes, and non- 
coding regions 

The sizes of 22 tRNA genes identified in mt genomes of H. 
taichui ranged from 60 to 73 nucleotides (nt) in length. Of the 
22 tRNA genes, 20 could be folded into the conventional clo- 
verleaf structure, including a 7 bp amino-acyl stem, a 2-4 bp 
DHL! ami with a 3-10 nt loop, a 5 bp anticodon stem with a 
loop of 7 nt, and a 2-8 bp of the Ti|/C arm with a 3-10 nt loop. 
The 2 tRNAs specifying serine were exceptions (Fig. 2). The 
rrnL was located between tRNA-Thr and tRNA-Cys, and rrnS 
was located between tRNA-Cys and cox2. The rmS and rrnL of H. 
taichui were 747 nt and 979 nt in length. Each ribosomal gene 
was assumed to directly abut neighboring genes. The A+T con- 
tents of the rrnL was 58.5%, respectively, and the A+T content 
of the rmS was 55.9%. Just 1 long non-coding region (LNR) 



was identified, located between the tRNA-Gly and tRNA-Glu, 
and lacked any tandem repeats. The size was 1,710 bp, com- 
prising 11 .3% of the genome, and the A+T content was 58.3%. 

Phylogenetic analysis 

A concatenated alignment set of 3,380 homologous amino 
acid positions from conserved blocks was used. Phylogenetic 
relationships among the 15 flatworms using different analyti- 
cal approaches (MP, ML, NJ, and BP methods) were the same 
in their topology (Fig. 3). Phylogenetic relationships among 
species were well resolved with very high nodal support through- 
out. In this tree, Schistosomatidae and (Fasciolidae+Paragoni 
midae+Opisthorchiidae+Heterophyidae) formed monophy- 
letic groups. H. taichui was resolved as sister to Opisthorchi- 
idae with a very high support in the phylogenetic analysis. 
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Fig. 2. DNA sequences for 22 tRNA genes of H. taichui mtDNA folded into inferred secondary structures. 
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Fig. 3. Phylogenetic relationships among trematode parasites based on inferred amino acid sequences of 1 2 mitochondrial protein-cod- 
ing gene loci for 11 species 14 individual using the outgroup, G. thymalli (GenBank accession no. NC_009682) by MP/ML/NJ/BR re- 
spectively (branch length for ML). 



DISCUSSION 

The mt genome arrangement of H. taichui was the same as 
that of Fasciola hepatica (NC_002546), Paragonimus westermani 
(NC_002354) and Opisthorchis spp., but distinct from the ar- 
rangement seen in some Schistosoma spp. [9,18]. All genes were 
transcribed in the same direction, as in other flatworms for 
which data are available [26]. H. taichui lacks atp8, a gene that 
is not seen in any flatworm species [18]. The majority of meta- 
zoan mtDNA sequences contain 2 non-coding regions of sig- 
nificant size difference, a long non-coding region (LNR) and a 
short non-coding region (SNR). In the case of H. taichui, a sin- 
gle long non-coding region was present. The nucleotide com- 
position of the entire mtDNA of H. taichui was biased toward 
T and G, with T being the most common nucleotide and C the 
least favored, in accordance with mt genomes of other trema- 
todes except fori? westermani (Table 3). Two genes, nad4L and 
nad4, overlapped by 40 nucleotides (Table 1), similar to the 
situation in other digeneans. The tRNA genes generally resem- 
ble those of other digeneans. A standard cloverleaf structure 
can be infened for most tRNAs. Exceptions are tRNA-S in which 
the paired dihydrouridine (DHU) arm is missing in all para- 
sitic flatworm species [18], although secondary sttuctures in- 
cluding this arm are feasible for some species [15] including H. 
taichui. The rmS was 747 nt in length, shorter than that of the 
homologs from other trematodes except for S. mansoni (744 
nt) and P. westermani (744 nt, not registered on GenBank). The 
rrnS of S. mekongi was noted as being 709 nt (GenBank acces- 
sion no. NC_002529), but it could be 39 nt longer if it directly 



abuts coxl in that species. The rrnl of H. taichui, at 979 nt, was 
the shortest among trematodes recorded yet. Morphological 
data have traditionally been used for taxonomic studies on 
flatworms. Such data are now being supplemented by data 
from ultra-structural and biochemical studies [27,28] and, in- 
creasingly, from molecular sequences. Published phylogenies 
using nuclear ribosomal genes [19,29] indicate that the Het- 
erophyidae is paraphyletic with respect to the Opisthorchiidae. 
Our mitochondrial sequences have yielded a tree consistent 
with this finding with H. taichui seen as a sister to Clonorchis+ 
Opisthorchis (family Opisthorchiidae). Sequences from addi- 
tional heterophyid and opisthorchiid mt genomes will be re- 
quired to confirm the findings from nuclear genes. 

In conclusion, the present study reported the complete 
mtDNA sequence and genome organization of H. taichui for 
the first time. Its constituent genes were compared with homo- 
logs from other trematodes. Our phylogenetic analysis of con- 
catenated protein-coding genes supports a sister group rela- 
tionship between families Opisthorchiidae and Heterophy- 
idae. These data will provide tools for the molecular diagnosis 
of haplorchiasis and for studies on the biology of the species. 
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